{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "[![AWS SDK for pandas](_static/logo.png \"AWS SDK for pandas\")](https://github.com/aws/aws-sdk-pandas)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "# 39 - Athena Iceberg"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Athena supports read, time travel, write, and DDL queries for Apache Iceberg tables that use the Apache Parquet format for data and the AWS Glue catalog for their metastore. More in [User Guide](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Create Iceberg table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import getpass\n",
    "\n",
    "bucket_name = getpass.getpass()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": "True"
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import awswrangler as wr\n",
    "\n",
    "glue_database = \"aws_sdk_pandas\"\n",
    "glue_table = \"iceberg_test\"\n",
    "path = f\"s3://{bucket_name}/iceberg_test/\"\n",
    "temp_path = f\"s3://{bucket_name}/iceberg_test_temp/\"\n",
    "\n",
    "# Cleanup table before create\n",
    "wr.catalog.delete_table_if_exists(database=glue_database, table=glue_table)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Create table & insert data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "It is possible to insert Pandas data frame into Iceberg table using `wr.athena.to_iceberg`. If the table does not exist, it will be created:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame({\"id\": [1, 2, 3], \"name\": [\"John\", \"Lily\", \"Richard\"]})\n",
    "\n",
    "wr.athena.to_iceberg(\n",
    "    df=df,\n",
    "    database=glue_database,\n",
    "    table=glue_table,\n",
    "    table_location=path,\n",
    "    temp_path=temp_path,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Alternatively, it is also possible to insert by directly running `INSERT INTO ... VALUES`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'QueryExecutionId': 'e339fcd2-9db1-43ac-bb9e-9730e6395b51',\n",
       " 'Query': \"INSERT INTO iceberg_test VALUES (1,'John'), (2, 'Lily'), (3, 'Richard')\",\n",
       " 'StatementType': 'DML',\n",
       " 'ResultConfiguration': {'OutputLocation': 's3://aws-athena-query-results-...-us-east-1/e339fcd2-9db1-43ac-bb9e-9730e6395b51'},\n",
       " 'ResultReuseConfiguration': {'ResultReuseByAgeConfiguration': {'Enabled': False}},\n",
       " 'QueryExecutionContext': {'Database': 'aws_sdk_pandas'},\n",
       " 'Status': {'State': 'SUCCEEDED',\n",
       "  'SubmissionDateTime': datetime.datetime(2023, 3, 16, 10, 40, 8, 612000, tzinfo=tzlocal()),\n",
       "  'CompletionDateTime': datetime.datetime(2023, 3, 16, 10, 40, 11, 143000, tzinfo=tzlocal())},\n",
       " 'Statistics': {'EngineExecutionTimeInMillis': 2242,\n",
       "  'DataScannedInBytes': 0,\n",
       "  'DataManifestLocation': 's3://aws-athena-query-results-...-us-east-1/e339fcd2-9db1-43ac-bb9e-9730e6395b51-manifest.csv',\n",
       "  'TotalExecutionTimeInMillis': 2531,\n",
       "  'QueryQueueTimeInMillis': 241,\n",
       "  'QueryPlanningTimeInMillis': 179,\n",
       "  'ServiceProcessingTimeInMillis': 48,\n",
       "  'ResultReuseInformation': {'ReusedPreviousResult': False}},\n",
       " 'WorkGroup': 'primary',\n",
       " 'EngineVersion': {'SelectedEngineVersion': 'Athena engine version 3',\n",
       "  'EffectiveEngineVersion': 'Athena engine version 3'}}"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wr.athena.start_query_execution(\n",
    "    sql=f\"INSERT INTO {glue_table} VALUES (1,'John'), (2, 'Lily'), (3, 'Richard')\",\n",
    "    database=glue_database,\n",
    "    wait=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'QueryExecutionId': '922c8f02-4c00-4050-b4a7-7016809efa2b',\n",
       " 'Query': \"INSERT INTO iceberg_test VALUES (4,'Anne'), (5, 'Jacob'), (6, 'Leon')\",\n",
       " 'StatementType': 'DML',\n",
       " 'ResultConfiguration': {'OutputLocation': 's3://aws-athena-query-results-...-us-east-1/922c8f02-4c00-4050-b4a7-7016809efa2b'},\n",
       " 'ResultReuseConfiguration': {'ResultReuseByAgeConfiguration': {'Enabled': False}},\n",
       " 'QueryExecutionContext': {'Database': 'aws_sdk_pandas'},\n",
       " 'Status': {'State': 'SUCCEEDED',\n",
       "  'SubmissionDateTime': datetime.datetime(2023, 3, 16, 10, 40, 24, 582000, tzinfo=tzlocal()),\n",
       "  'CompletionDateTime': datetime.datetime(2023, 3, 16, 10, 40, 27, 352000, tzinfo=tzlocal())},\n",
       " 'Statistics': {'EngineExecutionTimeInMillis': 2414,\n",
       "  'DataScannedInBytes': 0,\n",
       "  'DataManifestLocation': 's3://aws-athena-query-results-...-us-east-1/922c8f02-4c00-4050-b4a7-7016809efa2b-manifest.csv',\n",
       "  'TotalExecutionTimeInMillis': 2770,\n",
       "  'QueryQueueTimeInMillis': 329,\n",
       "  'QueryPlanningTimeInMillis': 189,\n",
       "  'ServiceProcessingTimeInMillis': 27,\n",
       "  'ResultReuseInformation': {'ReusedPreviousResult': False}},\n",
       " 'WorkGroup': 'primary',\n",
       " 'EngineVersion': {'SelectedEngineVersion': 'Athena engine version 3',\n",
       "  'EffectiveEngineVersion': 'Athena engine version 3'}}"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wr.athena.start_query_execution(\n",
    "    sql=f\"INSERT INTO {glue_table} VALUES (4,'Anne'), (5, 'Jacob'), (6, 'Leon')\",\n",
    "    database=glue_database,\n",
    "    wait=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Query\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>John</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>Anne</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>Lily</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>Richard</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Jacob</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>Leon</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id     name\n",
       "0   1     John\n",
       "1   4     Anne\n",
       "2   2     Lily\n",
       "3   3  Richard\n",
       "4   5    Jacob\n",
       "5   6     Leon"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wr.athena.read_sql_query(\n",
    "    sql=f'SELECT * FROM \"{glue_table}\"',\n",
    "    database=glue_database,\n",
    "    ctas_approach=False,\n",
    "    unload_approach=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Read query metadata"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "In a SELECT query, you can use the following properties after `table_name` to query Iceberg table metadata:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "- `$files` Shows a table's current data files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "- `$manifests` Shows a table's current file manifests"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "- `$history` Shows a table's history"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "- `$partitions` Shows a table's current partitions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>content</th>\n",
       "      <th>file_path</th>\n",
       "      <th>file_format</th>\n",
       "      <th>record_count</th>\n",
       "      <th>file_size_in_bytes</th>\n",
       "      <th>column_sizes</th>\n",
       "      <th>value_counts</th>\n",
       "      <th>null_value_counts</th>\n",
       "      <th>nan_value_counts</th>\n",
       "      <th>lower_bounds</th>\n",
       "      <th>upper_bounds</th>\n",
       "      <th>key_metadata</th>\n",
       "      <th>split_offsets</th>\n",
       "      <th>equality_ids</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>s3://.../iceberg_test/data/089a...</td>\n",
       "      <td>PARQUET</td>\n",
       "      <td>3</td>\n",
       "      <td>360</td>\n",
       "      <td>{1=48, 2=63}</td>\n",
       "      <td>{1=3, 2=3}</td>\n",
       "      <td>{1=0, 2=0}</td>\n",
       "      <td>{}</td>\n",
       "      <td>{1=1, 2=John}</td>\n",
       "      <td>{1=3, 2=Richard}</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>s3://.../iceberg_test/data/5736...</td>\n",
       "      <td>PARQUET</td>\n",
       "      <td>3</td>\n",
       "      <td>355</td>\n",
       "      <td>{1=48, 2=61}</td>\n",
       "      <td>{1=3, 2=3}</td>\n",
       "      <td>{1=0, 2=0}</td>\n",
       "      <td>{}</td>\n",
       "      <td>{1=4, 2=Anne}</td>\n",
       "      <td>{1=6, 2=Leon}</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   content                                          file_path file_format  \\\n",
       "0        0  s3://.../iceberg_test/data/089a...     PARQUET   \n",
       "1        0  s3://.../iceberg_test/data/5736...     PARQUET   \n",
       "\n",
       "   record_count  file_size_in_bytes  column_sizes value_counts  \\\n",
       "0             3                 360  {1=48, 2=63}   {1=3, 2=3}   \n",
       "1             3                 355  {1=48, 2=61}   {1=3, 2=3}   \n",
       "\n",
       "  null_value_counts nan_value_counts   lower_bounds      upper_bounds  \\\n",
       "0        {1=0, 2=0}               {}  {1=1, 2=John}  {1=3, 2=Richard}   \n",
       "1        {1=0, 2=0}               {}  {1=4, 2=Anne}     {1=6, 2=Leon}   \n",
       "\n",
       "  key_metadata split_offsets equality_ids  \n",
       "0         <NA>           NaN          NaN  \n",
       "1         <NA>           NaN          NaN  "
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wr.athena.read_sql_query(\n",
    "    sql=f'SELECT * FROM \"{glue_table}$files\"',\n",
    "    database=glue_database,\n",
    "    ctas_approach=False,\n",
    "    unload_approach=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>path</th>\n",
       "      <th>length</th>\n",
       "      <th>partition_spec_id</th>\n",
       "      <th>added_snapshot_id</th>\n",
       "      <th>added_data_files_count</th>\n",
       "      <th>added_rows_count</th>\n",
       "      <th>existing_data_files_count</th>\n",
       "      <th>existing_rows_count</th>\n",
       "      <th>deleted_data_files_count</th>\n",
       "      <th>deleted_rows_count</th>\n",
       "      <th>partitions</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>s3://.../iceberg_test/metadata/...</td>\n",
       "      <td>6538</td>\n",
       "      <td>0</td>\n",
       "      <td>4379263637983206651</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>s3://.../iceberg_test/metadata/...</td>\n",
       "      <td>6548</td>\n",
       "      <td>0</td>\n",
       "      <td>2934717851675145063</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                path  length  \\\n",
       "0  s3://.../iceberg_test/metadata/...    6538   \n",
       "1  s3://.../iceberg_test/metadata/...    6548   \n",
       "\n",
       "   partition_spec_id    added_snapshot_id  added_data_files_count  \\\n",
       "0                  0  4379263637983206651                       1   \n",
       "1                  0  2934717851675145063                       1   \n",
       "\n",
       "   added_rows_count  existing_data_files_count  existing_rows_count  \\\n",
       "0                 3                          0                    0   \n",
       "1                 3                          0                    0   \n",
       "\n",
       "   deleted_data_files_count  deleted_rows_count partitions  \n",
       "0                         0                   0         []  \n",
       "1                         0                   0         []  "
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wr.athena.read_sql_query(\n",
    "    sql=f'SELECT * FROM \"{glue_table}$manifests\"',\n",
    "    database=glue_database,\n",
    "    ctas_approach=False,\n",
    "    unload_approach=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>made_current_at</th>\n",
       "      <th>snapshot_id</th>\n",
       "      <th>parent_id</th>\n",
       "      <th>is_current_ancestor</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2023-03-16 09:40:10.438000+00:00</td>\n",
       "      <td>2934717851675145063</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2023-03-16 09:40:26.754000+00:00</td>\n",
       "      <td>4379263637983206651</td>\n",
       "      <td>2934717851675144704</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   made_current_at          snapshot_id            parent_id  \\\n",
       "0 2023-03-16 09:40:10.438000+00:00  2934717851675145063                 <NA>   \n",
       "1 2023-03-16 09:40:26.754000+00:00  4379263637983206651  2934717851675144704   \n",
       "\n",
       "   is_current_ancestor  \n",
       "0                 True  \n",
       "1                 True  "
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = wr.athena.read_sql_query(\n",
    "    sql=f'SELECT * FROM \"{glue_table}$history\"',\n",
    "    database=glue_database,\n",
    "    ctas_approach=False,\n",
    "    unload_approach=False,\n",
    ")\n",
    "\n",
    "# Save snapshot id\n",
    "snapshot_id = df.snapshot_id[0]\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>record_count</th>\n",
       "      <th>file_count</th>\n",
       "      <th>total_size</th>\n",
       "      <th>data</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>6</td>\n",
       "      <td>2</td>\n",
       "      <td>715</td>\n",
       "      <td>{id={min=1, max=6, null_count=0, nan_count=nul...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   record_count  file_count  total_size  \\\n",
       "0             6           2         715   \n",
       "\n",
       "                                                data  \n",
       "0  {id={min=1, max=6, null_count=0, nan_count=nul...  "
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wr.athena.read_sql_query(\n",
    "    sql=f'SELECT * FROM \"{glue_table}$partitions\"',\n",
    "    database=glue_database,\n",
    "    ctas_approach=False,\n",
    "    unload_approach=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Time travel"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>John</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>Anne</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>Lily</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>Richard</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Jacob</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>Leon</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id     name\n",
       "0   1     John\n",
       "1   4     Anne\n",
       "2   2     Lily\n",
       "3   3  Richard\n",
       "4   5    Jacob\n",
       "5   6     Leon"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wr.athena.read_sql_query(\n",
    "    sql=f\"SELECT * FROM {glue_table} FOR TIMESTAMP AS OF (current_timestamp - interval '5' second)\",\n",
    "    database=glue_database,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Version travel\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>John</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Lily</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Richard</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id     name\n",
       "0   1     John\n",
       "1   2     Lily\n",
       "2   3  Richard"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wr.athena.read_sql_query(\n",
    "    sql=f\"SELECT * FROM {glue_table} FOR VERSION AS OF {snapshot_id}\",\n",
    "    database=glue_database,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Optimize"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The `OPTIMIZE table REWRITE DATA` compaction action rewrites data files into a more optimized layout based on their size and number of associated delete files. For syntax and table property details, see [OPTIMIZE](https://docs.aws.amazon.com/athena/latest/ug/optimize-statement.html).\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'QueryExecutionId': '94666790-03ae-42d7-850a-fae99fa79a68',\n",
       " 'Query': 'OPTIMIZE iceberg_test REWRITE DATA USING BIN_PACK',\n",
       " 'StatementType': 'DDL',\n",
       " 'ResultConfiguration': {'OutputLocation': 's3://aws-athena-query-results-...-us-east-1/tables/94666790-03ae-42d7-850a-fae99fa79a68'},\n",
       " 'ResultReuseConfiguration': {'ResultReuseByAgeConfiguration': {'Enabled': False}},\n",
       " 'QueryExecutionContext': {'Database': 'aws_sdk_pandas'},\n",
       " 'Status': {'State': 'SUCCEEDED',\n",
       "  'SubmissionDateTime': datetime.datetime(2023, 3, 16, 10, 49, 42, 857000, tzinfo=tzlocal()),\n",
       "  'CompletionDateTime': datetime.datetime(2023, 3, 16, 10, 49, 45, 655000, tzinfo=tzlocal())},\n",
       " 'Statistics': {'EngineExecutionTimeInMillis': 2622,\n",
       "  'DataScannedInBytes': 220,\n",
       "  'DataManifestLocation': 's3://aws-athena-query-results-...-us-east-1/tables/94666790-03ae-42d7-850a-fae99fa79a68-manifest.csv',\n",
       "  'TotalExecutionTimeInMillis': 2798,\n",
       "  'QueryQueueTimeInMillis': 124,\n",
       "  'QueryPlanningTimeInMillis': 252,\n",
       "  'ServiceProcessingTimeInMillis': 52,\n",
       "  'ResultReuseInformation': {'ReusedPreviousResult': False}},\n",
       " 'WorkGroup': 'primary',\n",
       " 'EngineVersion': {'SelectedEngineVersion': 'Athena engine version 3',\n",
       "  'EffectiveEngineVersion': 'Athena engine version 3'}}"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wr.athena.start_query_execution(\n",
    "    sql=f\"OPTIMIZE {glue_table} REWRITE DATA USING BIN_PACK\",\n",
    "    database=glue_database,\n",
    "    wait=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Vacuum\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "`VACUUM` performs [snapshot expiration](https://iceberg.apache.org/docs/latest/spark-procedures/#expire_snapshots) and [orphan file removal](https://iceberg.apache.org/docs/latest/spark-procedures/#remove_orphan_files). These actions reduce metadata size and remove files not in the current table state that are also older than the retention period specified for the table. For syntax details, see [VACUUM](https://docs.aws.amazon.com/athena/latest/ug/vacuum-statement.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'QueryExecutionId': '717a7de6-b873-49c7-b744-1b0b402f24c9',\n",
       " 'Query': 'VACUUM iceberg_test',\n",
       " 'StatementType': 'DML',\n",
       " 'ResultConfiguration': {'OutputLocation': 's3://aws-athena-query-results-...-us-east-1/717a7de6-b873-49c7-b744-1b0b402f24c9.csv'},\n",
       " 'ResultReuseConfiguration': {'ResultReuseByAgeConfiguration': {'Enabled': False}},\n",
       " 'QueryExecutionContext': {'Database': 'aws_sdk_pandas'},\n",
       " 'Status': {'State': 'SUCCEEDED',\n",
       "  'SubmissionDateTime': datetime.datetime(2023, 3, 16, 10, 50, 41, 14000, tzinfo=tzlocal()),\n",
       "  'CompletionDateTime': datetime.datetime(2023, 3, 16, 10, 50, 43, 441000, tzinfo=tzlocal())},\n",
       " 'Statistics': {'EngineExecutionTimeInMillis': 2229,\n",
       "  'DataScannedInBytes': 0,\n",
       "  'TotalExecutionTimeInMillis': 2427,\n",
       "  'QueryQueueTimeInMillis': 153,\n",
       "  'QueryPlanningTimeInMillis': 30,\n",
       "  'ServiceProcessingTimeInMillis': 45,\n",
       "  'ResultReuseInformation': {'ReusedPreviousResult': False}},\n",
       " 'WorkGroup': 'primary',\n",
       " 'EngineVersion': {'SelectedEngineVersion': 'Athena engine version 3',\n",
       "  'EffectiveEngineVersion': 'Athena engine version 3'}}"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wr.athena.start_query_execution(\n",
    "    sql=f\"VACUUM {glue_table}\",\n",
    "    database=glue_database,\n",
    "    wait=True,\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.9.14",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}